If you want to find the position of a match in an incoming string, simply check the length of $` (That's \(PREMATCH if you've chosen to use English;) to check where it starts, and add the length of \)& (that's $MATCH) to find where it ends.
Let's say I want to find all the URLs refered to in a web page that's loaded into the variable $html. I could write:
1 2 | push @section ,[ length ($`), length ($&), $1 ] while ( $html =~ m/(https?://[^ >"]+)/g); |
and that will give me a list of 3-element lists containing start point, length and actual string matched. Here's the code to display that list:
1 2 3 | foreach $element ( @section ) { print ( join ( ", " ,@ $element ), "\n" ); } |
and here's some of the results from the sources of our resources index
5979, 36, http://www.wellho.net/forum/top.html 6967, 36, http://www.wellho.net/net/mouth.html 7059, 42, http://www.wellho.net/downloads/index.html 8369, 67, http://www.wellho.net/mouth/387_Training-course-plans-for-2006.html 9365, 43, http://www.trainingcenter.co.uk/travel.html 9516, 45, http://reiseauskunft.bahn.de/bin/query.exe/en 9599, 59, http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=MKM 9861, 48, https://lightning.he.net/~wellho/net/secure.html
P.S. I loaded my whole web page into a single variable using the code
1 2 3 | open (FH, "/Library/WebServer/live_html/resources/index.html" ); undef $/; $html = <FH>; |
So, the whole script could be:
1 2 3 4 5 6 7 8 9 10 | open FH, $ARGV [0] or die $!; undef $/; $html = <FH>; push @section ,[ length ($`), length ($&), $1 ] while ( $html =~ m!(https?://[^ >"]+)!g); foreach $element ( @section ) { print ( join ( ", " ,@ $element ), "\n" ); } |
Hide Comments